Gradient - Based Optimization of Markov Reward Processes :

نویسندگان

  • Peter Marbach
  • John N. Tsitsiklis
چکیده

We consider a discrete time, nite state Markov reward process that depends on a set of parameters. In earlier work, we proposed a class of (stochastic) gradient descent methods that tune the parameters in order to optimize the average reward, using a single (possibly simulated) sample path of the process of interest. The resulting algorithms can be implemented online, and have the property that the gradient of the average reward converges to zero with probability 1. There is a drawback, however, in that the updates can have a high variance, resulting in slow convergence. In this paper, we address this issue and propose two approaches to reduce the variance which, however, introduce an additional bias into the update direction. We derive bounds for the resulting bias term and characterize the asymptotic behavior of the gradient of the average reward. For one of the approaches considered, the magnitude of the bias term exhibits an interesting dependence on the mixing time of the underlying Markov chain. We use a call admission control problem to illustrate the performance of one of the algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A State Aggregation Approach to Singularly Perturbed Markov Reward Processes

In this paper, we propose a single sample path based algorithm with state aggregation to optimize the average rewards of singularly perturbed Markov reward processes (SPMRPs) with a large scale state spaces. It is assumed that such a reward process depend on a set of parameters. Differing from the other kinds of Markov chain, SPMRPs have their own hierarchical structure. Based on this special s...

متن کامل

Two Time-scale Gradient Approximation Algorithm for Adaptive Markov Reward Processes

In this paper, we study the stochastic optimization problem of adaptive Markov reward processes parameterized by two sets of parameters, including adjustable parameters, and unknown constant parameters. As the existing algorithms do not work well for this problem, we propose a novel two time-scale gradient approximation algorithm. This new algorithm yields fast convergence, small sample path va...

متن کامل

Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the “localization” concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with...

متن کامل

Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes

We consider a discrete time, ®nite state Markov reward process that depends on a set of parameters. We start with a brief review of (stochastic) gradient descent methods that tune the parameters in order to optimize the average reward, using a single (possibly simulated) sample path of the process of interest. The resulting algorithms can be implemented online, and have the property that the gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000